Skip to content

Vector databases

A vector database is a database designed to handle high-dimensional vectors efficiently. These databases are used for storing embeddings or representations of data points in a high-dimensional space.

Vector databases excel in performing operations that standard databases struggle with, such as nearest neighbour search in high-dimensional spaces. This operation is fundamental in many machine learning applications, including recommendation systems, image recognition, natural language processing, and more.

Example

Let’s consider a machine learning model that converts images into high-dimensional vectors (also known as embeddings), where each dimension captures a different feature of the image. To find images similar to a given image, the system would search for vectors close to the vector of the given image. With a high-dimensional space and a large number of vectors, this operation can be computationally expensive. Vector databases are optimized for this kind of operation, enabling efficient similarity searches.

Examples

Databases

  1. Qdrant: is a vector similarity search engine and vector database. It provides a production-ready service with a convenient API to store, search, and manage points—vectors with an additional payload Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications. Qdrant is written in Rust.

  2. Milvus: An open-source vector database designed specifically for AI and machine learning applications. It supports a variety of distance metrics and is scalable, reliable, and capable of handling hybrid (vector and scalar) search. Milvus is written in Go.

  3. Weaviate: Weaviate is an open source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients. Written in Go.

  4. Deeplake: Deep Lake is a Vector Database powered by a unique storage format optimized for deep-learning and Large Language Model (LLM) based applications. It simplifies the deployment of enterprise-grade LLM-based products by offering storage for all data types (embeddings, audio, text, videos, images, pdfs, annotations, etc.), querying and vector search, data streaming while training models at scale, data versioning and lineage for all workloads, and integrations with popular tools such as LangChain, LlamaIndex, Weights and Biases, and many more. Written in Python.

  1. FAISS (Facebook AI Similarity Search): Developed by Facebook, FAISS isn’t a database per se but a library for efficient similarity search of high-dimensional vectors. It is often used in combination with a traditional database to manage the vector part of data.

  2. Annoy (Approximate Nearest Neighbors Oh Yeah): Developed by Spotify, Annoy is a C++ library with Python bindings that supports efficient search of approximate nearest neighbors.

  3. NMSLIB (Non-Metric Space Library): An efficient cross-platform library for nearest neighbor search in generic non-metric spaces.

  4. NGT (Neighborhood Graph and Tree): Developed by Yahoo! Japan, NGT provides high-speed search algorithms for nearest neighbors.

  5. ScaNN (Scalable Nearest Neighbors): A library for efficient vector similarity search, released by Google Research.

  6. Vectra and Vectra-py: A local vector database with features similar to pinecone but built using local files.

References

For searching over many vectors quickly, we recommend using a vector database. You can find examples of working with vector databases and the OpenAI API [in our Cookbook](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases) on GitHub.

Vector database options include:

- [Pinecone](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases/pinecone), a fully managed vector database
- [Weaviate](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases/weaviate), an open-source vector search engine
- [Redis](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases/redis) as a vector database
- [Qdrant](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases/qdrant), a vector search engine
- [Milvus](https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb), a vector database built for scalable similarity search
- [Chroma](https://github.com/chroma-core/chroma), an open-source embeddings store
- [Typesense](https://typesense.org/docs/0.24.0/api/vector-search.html), fast open source vector search
- [Zilliz](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases/zilliz), data infrastructure, powered by Milvus

#machine-learning

#database #embeddings

Page last modified: 2024-11-13 14:01:29